Spatial Software Pipelining on Distributed Architectures for Sparse Matrix Codes
نویسنده
چکیده
Wire delays and communication time are forcing processors to become decentralized modules communicating through a fast, scalable interconnect. For scalability, every portion of the processor must be decentralized, including the memory system. Compilers that can take a sequential program as input and parallelize it (including the memory) across the new processors are necessary. Much research has gone towards the ensuing problem of optimal data layout in memory and instruction placement, but the problem is so large that some aspects have yet to be addressed. This thesis presents spatial software pipelining, a new mechanism for doing data layout and instruction placement for loops. Spatial software pipelining places instructions and memory to avoid communication cycles, decreases the dependencies of tiles on each other, allows the bodies of loops to be pipelined across tiles, allows branch conditions to be pipelined along with data, and reduces the execution time of loops across multiple iterations. This thesis additionally presents the algorithms used to effect spatial software pipelining. Results show that spatial software pipelining performs 2.14x better than traditional assignment and scheduling techniques for a sparse matrix benchmark, and that spatial software pipelining can improve the execution time of certain loops by over a factor of three. Thesis Supervisor: Anant Agarwal Title: Professor
منابع مشابه
Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines
Many partitioned scientiic programs can be modeled as iterative execution of computational tasks, represented by iterative task graphs (ITGs). In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead without searching the entire iteration space. An ITG may or may not have dependence cycles and we propose heuristic algorit...
متن کاملIntegrating Software Pipelining and Graph Scheduling for Iterative Scientific Computations
Graph scheduling has been shown eeective for solving irregular problems represented as directed acyclic graphs(DAGs) on distributed memory systems. Many scientiic applications can also be modeled as iterative task graphs(ITGs). In this paper, we model the SOR computation for solving sparse matrix systems in terms of ITGs and address the optimization issues for scheduling ITGs when communication...
متن کاملA High Performance, Portable Distributed BLAS Implementation
In this paper, we give a report on recent developments for the Distributed BLAS (DBLAS) project. These include a powerful distributed matrix representation which yields a simple interface to the DBLAS, and the redesign the DBLAS algorithms terms of powerfuìspread' and`reduce' matrix communication operations for reasons of programmability. The DBLAS codes achieve portability by supporting BLACS ...
متن کاملDecomposed Software Pipelining: A New Approach to Exploit Instruction Level Parallelism for Loop Programs
This paper presents a new view on software pipelining, in which we consider software pipelining as an instruction level transformation from a vector of one-dimension to a matrix of two-dimensions. Thus, the software pipelining problem can be naturally decomposed into two subproblems, one is to determine the row-numbers of operations in the matrix and another is to determine the column-numbers. ...
متن کاملP-sparslib: a Portable Library of Distributed Memory Sparse Iterative Solvers
Domain Decomposition techniques constitute an important class of methods which are especially appropriate in a parallel computing environment. However, few general purpose computational codes based on these techniques have been developed so far. In this paper, we propose one such solver developed around the idea of `distributed sparse matrices'. We explore issues related to data structures for ...
متن کامل